Лабораторная работа #1

Построение идеальной хеш-таблицы (без коллизий)

Мартиросян Елизавета, Максимов Антон


In [131]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import warnings
warnings.filterwarnings('ignore')
sns.set_style("whitegrid")

idealHashing_results_int = pd.read_csv('results/idealHashing/int.out', sep=';', index_col=False)
twoTieredApproach_results_int = pd.read_csv('results/twoTieredApproach/int.out', sep=';', index_col=False)
graphHashing_results_int = pd.read_csv('results/graphHashing/int.out', sep=';', index_col=False)
idealHashing_results_string = pd.read_csv('results/idealHashing/string.out', sep=';', index_col=False)
twoTieredApproach_results_string = pd.read_csv('results/twoTieredApproach/string.out', sep=';', index_col=False)
graphHashing_results_string = pd.read_csv('results/graphHashing/string.out', sep=';', index_col=False)
idealHashing_results_real = pd.read_csv('results/idealHashing/real.out', sep=';', index_col=False)
twoTieredApproach_results_real = pd.read_csv('results/twoTieredApproach/real.out', sep=';', index_col=False)
graphHashing_results_real = pd.read_csv('results/graphHashing/real.out', sep=';', index_col=False)
In [2]:
idealHashing_results_int
Out[2]:
p size hashing time(microseconds/10^-6 sec) average search time(microseconds/10^-6 sec) processed memory(KB) average number attempts
0 1 50 17.2 0.020000 28.0 1.4
1 1 100 60.8 0.024000 63.2 1.4
2 1 500 1398.8 0.034800 1958.4 1.6
3 1 1000 5543.0 0.045200 7816.0 1.8
4 1 2000 20814.6 0.056300 31256.0 1.4
5 1 3000 46180.0 0.054400 70320.0 3.2
6 1 5000 127108.0 0.068000 195320.0 1.6
7 2 50 26.0 0.020000 28.0 1.8
8 2 100 107.8 0.026000 160.0 1.4
9 2 500 2554.4 0.029600 3912.0 1.2
10 2 1000 10107.6 0.036200 15632.0 1.2
11 2 2000 41159.8 0.047700 62504.0 1.2
12 2 3000 90199.8 0.058067 140632.0 1.6
13 3 50 44.4 0.028000 58.4 2.0
14 3 100 161.0 0.030000 234.4 1.2
15 3 500 4094.8 0.041600 5864.0 1.6
16 3 1000 16875.4 0.058800 23440.0 1.2
17 3 2000 63713.6 0.062700 93752.0 1.6
18 3 3000 137051.0 0.067667 210944.0 1.0
19 3 5000 539356.0 0.113600 585944.0 1.2
20 4 50 232.8 0.148000 62.4 1.4
21 4 100 891.2 0.110000 320.0 1.0
22 4 500 32748.4 0.064800 7816.0 1.0
23 4 1000 109234.0 0.096400 31256.0 1.6
24 4 5000 2415700.0 0.204800 781256.0 1.4


Квадратичный подход к построению идеальной хеш-таблицы

Зависемости при фиксированном размере входных данных

Зависемость времени построения от размера хэш таблицы

In [4]:
g = sns.lmplot(x="p", y="hashing time(microseconds/10^-6 sec)", hue="size", col="size",
               data=idealHashing_results_int, height=3, aspect=1, order=1)
In [5]:
g = sns.lmplot(x="p", y="hashing time(microseconds/10^-6 sec)", hue="size", col="size",
               data=idealHashing_results_string, height=3, aspect=1, order=1)
In [132]:
g = sns.lmplot(x="p", y="hashing time(microseconds/10^-6 sec)", hue="size", col="size",
               data=idealHashing_results_real, height=3, aspect=1, order=1)

Зависемость затраченной памяти от размера хэш таблицы

In [7]:
g = sns.lmplot(x="p", y="processed memory(KB)", hue="size", col="size",
               data=idealHashing_results_int, height=3, aspect=1, order=2)
In [8]:
g = sns.lmplot(x="p", y="processed memory(KB)", hue="size", col="size",
               data=idealHashing_results_string, height=3, aspect=1, order=2)
In [133]:
g = sns.lmplot(x="p", y="processed memory(KB)", hue="size", col="size",
               data=idealHashing_results_real, height=3, aspect=1, order=2)

Зависемость среднего времени поиска от размера хэш таблицы

In [9]:
g = sns.lmplot(x="p", y="average search time(microseconds/10^-6 sec)", hue="size", col="size",
               data=idealHashing_results_int, height=3, aspect=1, order=1)
In [10]:
g = sns.lmplot(x="p", y="average search time(microseconds/10^-6 sec)", hue="size", col="size",
               data=idealHashing_results_string, height=3, aspect=1, order=1)
In [134]:
g = sns.lmplot(x="p", y="average search time(microseconds/10^-6 sec)", hue="size", col="size",
               data=idealHashing_results_real, height=3, aspect=1, order=1)

Зависемости от размеров входных данных (p = const)

Зависемость времени построения от размера хэш таблицы

In [11]:
g = sns.lmplot(x="size", y="hashing time(microseconds/10^-6 sec)", hue="p", col="p",
               data=idealHashing_results_int, height=3, aspect=1, order=1)
In [12]:
g = sns.lmplot(x="size", y="hashing time(microseconds/10^-6 sec)", hue="p", col="p",
               data=idealHashing_results_string, height=3, aspect=1, order=1)
In [135]:
g = sns.lmplot(x="size", y="hashing time(microseconds/10^-6 sec)", hue="p", col="p",
               data=idealHashing_results_real, height=3, aspect=1, order=1)

Зависемость затраченной памяти от размера хэш таблицы

In [13]:
g = sns.lmplot(x="size", y="processed memory(KB)", hue="p", col="p",
               data=idealHashing_results_int, height=3, aspect=1, order=2)
In [137]:
g = sns.lmplot(x="size", y="processed memory(KB)", hue="p", col="p",
               data=idealHashing_results_string, height=3, aspect=1, order=2)
In [138]:
g = sns.lmplot(x="size", y="processed memory(KB)", hue="p", col="p",
               data=idealHashing_results_real, height=3, aspect=1, order=2)

Зависемость среднего времени поиска от размера хэш таблицы

In [14]:
g = sns.lmplot(x="size", y="average search time(microseconds/10^-6 sec)", hue="p", col="p",
               data=idealHashing_results_int, height=3, aspect=1, order=1)
In [139]:
g = sns.lmplot(x="size", y="average search time(microseconds/10^-6 sec)", hue="p", col="p",
               data=idealHashing_results_string, height=3, aspect=1, order=1)
In [140]:
g = sns.lmplot(x="size", y="average search time(microseconds/10^-6 sec)", hue="p", col="p",
               data=idealHashing_results_real, height=3, aspect=1, order=1)

Двух-уровневый подход к построению идеальной хеш-таблицы

Зависемости при фиксированном размере входных данных

Зависемость времени построения от размера хэш таблицы

In [15]:
g = sns.lmplot(x="p", y="hashing time(microseconds/10^-6 sec)", hue="size", col="size",
               data=twoTieredApproach_results_int, height=3, aspect=1, order=1)
In [16]:
g = sns.lmplot(x="p", y="hashing time(microseconds/10^-6 sec)", hue="size", col="size",
               data=twoTieredApproach_results_string, height=3, aspect=1, order=1)
In [141]:
g = sns.lmplot(x="p", y="hashing time(microseconds/10^-6 sec)", hue="size", col="size",
               data=twoTieredApproach_results_real, height=3, aspect=1, order=1)

Зависемость затраченной памяти от размера хэш таблицы

In [17]:
g = sns.lmplot(x="p", y="processed memory(KB)", hue="size", col="size",
               data=twoTieredApproach_results_int, height=3, aspect=1, order=1)
In [142]:
g = sns.lmplot(x="p", y="processed memory(KB)", hue="size", col="size",
               data=twoTieredApproach_results_string, height=3, aspect=1, order=1)
In [143]:
g = sns.lmplot(x="p", y="processed memory(KB)", hue="size", col="size",
               data=twoTieredApproach_results_real, height=3, aspect=1, order=1)

Зависемость среднего времени поиска от размера хэш таблицы

In [18]:
g = sns.lmplot(x="p", y="average search time(microseconds/10^-6 sec)", hue="size", col="size",
               data=twoTieredApproach_results_int, height=3, aspect=1, order=1)
In [19]:
g = sns.lmplot(x="p", y="average search time(microseconds/10^-6 sec)", hue="size", col="size",
               data=twoTieredApproach_results_string, height=3, aspect=1, order=1)
In [144]:
g = sns.lmplot(x="p", y="average search time(microseconds/10^-6 sec)", hue="size", col="size",
               data=twoTieredApproach_results_real, height=3, aspect=1, order=1)

Зависемости от размеров входных данных (p = const)

Зависемость времени построения от размера хэш таблицы

In [20]:
g = sns.lmplot(x="size", y="hashing time(microseconds/10^-6 sec)", hue="p", col="p",
               data=twoTieredApproach_results_int, height=3, aspect=1, order=1)
In [145]:
g = sns.lmplot(x="size", y="hashing time(microseconds/10^-6 sec)", hue="p", col="p",
               data=twoTieredApproach_results_string, height=3, aspect=1, order=1)
In [146]:
g = sns.lmplot(x="size", y="hashing time(microseconds/10^-6 sec)", hue="p", col="p",
               data=twoTieredApproach_results_real, height=3, aspect=1, order=1)

Зависемость затраченной памяти от размера хэш таблицы

In [21]:
g = sns.lmplot(x="size", y="processed memory(KB)", hue="p", col="p",
               data=twoTieredApproach_results_int, height=3, aspect=1, order=1)
In [22]:
g = sns.lmplot(x="size", y="processed memory(KB)", hue="p", col="p",
               data=twoTieredApproach_results_string, height=3, aspect=1, order=1)
In [147]:
g = sns.lmplot(x="size", y="processed memory(KB)", hue="p", col="p",
               data=twoTieredApproach_results_real, height=3, aspect=1, order=1)

Зависемость среднего времени поиска от размера хэш таблицы

In [23]:
g = sns.lmplot(x="size", y="average search time(microseconds/10^-6 sec)", hue="p", col="p",
               data=twoTieredApproach_results_int, height=3, aspect=1, order=1)
In [24]:
g = sns.lmplot(x="size", y="average search time(microseconds/10^-6 sec)", hue="p", col="p",
               data=twoTieredApproach_results_string, height=3, aspect=1, order=1)
In [148]:
g = sns.lmplot(x="size", y="average search time(microseconds/10^-6 sec)", hue="p", col="p",
               data=twoTieredApproach_results_real, height=3, aspect=1, order=1)

Графовый подход к построению идеальной хеш-таблицы

Зависемости при фиксированном размере входных данных

Зависемость времени построения от размера хэш таблицы

In [25]:
g = sns.lmplot(x="p", y="hashing time(microseconds/10^-6 sec)", hue="size", col="size",
               data=graphHashing_results_int, height=3, aspect=1, order=1)
In [26]:
g = sns.lmplot(x="p", y="hashing time(microseconds/10^-6 sec)", hue="size", col="size",
               data=graphHashing_results_string, height=3, aspect=1, order=1)
In [149]:
g = sns.lmplot(x="p", y="hashing time(microseconds/10^-6 sec)", hue="size", col="size",
               data=graphHashing_results_real, height=3, aspect=1, order=1)

Зависемость затраченной памяти от размера хэш таблицы

In [27]:
g = sns.lmplot(x="p", y="processed memory(KB)", hue="size", col="size",
               data=graphHashing_results_int, height=3, aspect=1, order=1)
In [28]:
g = sns.lmplot(x="p", y="processed memory(KB)", hue="size", col="size",
               data=graphHashing_results_string, height=3, aspect=1, order=1)
In [150]:
g = sns.lmplot(x="p", y="processed memory(KB)", hue="size", col="size",
               data=graphHashing_results_real, height=3, aspect=1, order=1)

Зависемость среднего времени поиска от размера хэш таблицы

In [29]:
g = sns.lmplot(x="p", y="average search time(microseconds/10^-6 sec)", hue="size", col="size",
               data=graphHashing_results_int, height=3, aspect=1, order=1)
In [151]:
g = sns.lmplot(x="p", y="average search time(microseconds/10^-6 sec)", hue="size", col="size",
               data=graphHashing_results_string, height=3, aspect=1, order=1)
In [152]:
g = sns.lmplot(x="p", y="average search time(microseconds/10^-6 sec)", hue="size", col="size",
               data=graphHashing_results_real, height=3, aspect=1, order=1)

Зависемости от размеров входных данных (p = const)

Зависемость времени построения от размера хэш таблицы

In [30]:
g = sns.lmplot(x="size", y="hashing time(microseconds/10^-6 sec)", hue="p", col="p",
               data=graphHashing_results_int, height=3, aspect=1, order=1)
In [31]:
g = sns.lmplot(x="size", y="hashing time(microseconds/10^-6 sec)", hue="p", col="p",
               data=graphHashing_results_string, height=3, aspect=1, order=1)
In [153]:
g = sns.lmplot(x="size", y="hashing time(microseconds/10^-6 sec)", hue="p", col="p",
               data=graphHashing_results_real, height=3, aspect=1, order=1)

Зависемость затраченной памяти от размера хэш таблицы

In [32]:
g = sns.lmplot(x="size", y="processed memory(KB)", hue="p", col="p",
               data=graphHashing_results_int, height=3, aspect=1, order=1)
In [33]:
g = sns.lmplot(x="size", y="processed memory(KB)", hue="p", col="p",
               data=graphHashing_results_string, height=3, aspect=1, order=1)
In [154]:
g = sns.lmplot(x="size", y="processed memory(KB)", hue="p", col="p",
               data=graphHashing_results_real, height=3, aspect=1, order=1)

Зависемость среднего времени поиска от размера хэш таблицы

In [34]:
g = sns.lmplot(x="size", y="average search time(microseconds/10^-6 sec)", hue="p", col="p",
               data=graphHashing_results_int, height=3, aspect=1, order=1)
In [156]:
g = sns.lmplot(x="size", y="average search time(microseconds/10^-6 sec)", hue="p", col="p",
               data=graphHashing_results_string, height=3, aspect=1, order=1)
In [157]:
g = sns.lmplot(x="size", y="average search time(microseconds/10^-6 sec)", hue="p", col="p",
               data=graphHashing_results_real, height=3, aspect=1, order=1)

Сравнение

In [112]:
def getBestSolution(results, col_name, sizes):
    avg_values = []
    for size in sizes:
        min_value = results[results['size'] == size]['hashing time(microseconds/10^-6 sec)'].min()
        table = results[results['hashing time(microseconds/10^-6 sec)'] == min_value]
        best_value = table[col_name].values[0]
        avg_values.append(best_value)
    return avg_values
In [124]:
def draw(col_name, arr1, arr2, arr3, sizes=None):
    
    if sizes is None:
        sizes = idealHashing_results_int['size'].unique()

    avg_search_values_idealHashing = getBestSolution(arr1, col_name, sizes)
    avg_search_values_twoTieredApproach = getBestSolution(arr2, col_name, sizes)
    avg_search_values_graphHashing = getBestSolution(arr3, col_name, sizes)
    
    df = pd.DataFrame([sizes,
                       avg_search_values_idealHashing,
                       avg_search_values_twoTieredApproach,
                       avg_search_values_graphHashing],
                      index=['size', 'Ideal Hashing', 'Two tiered approach', 'Graph hashing']).T

    if col_name == 'average search time(microseconds/10^-6 sec)':
        sns.lmplot(x="size", y="Ideal Hashing", data=df, order=1)
    else:
        sns.lmplot(x="size", y="Ideal Hashing", data=df, order=2)

    sns.lmplot(x="size", y="Two tiered approach", data=df, order=1)
    sns.lmplot(x="size", y="Graph hashing", data=df, order=1)
    plt.figure(figsize=(20, 10))
    g = sns.lineplot(data=df[ ['Ideal Hashing', 'Two tiered approach', 'Graph hashing'] ],
                     markers=True, dashes=False, linewidth=5, markersize=10)
    g.tick_params(labelsize=16)
    g.set_xlabel("Data size",fontsize=16)
    g.set_ylabel(col_name,fontsize=16)
    plt.setp(g.get_legend().get_texts(), fontsize='16')
    plt.show()

Время хеширования

In [125]:
draw('hashing time(microseconds/10^-6 sec)', idealHashing_results_int, twoTieredApproach_results_int, graphHashing_results_int)
In [126]:
draw('hashing time(microseconds/10^-6 sec)',
     idealHashing_results_string,
     twoTieredApproach_results_string,
     graphHashing_results_string)
In [158]:
draw('hashing time(microseconds/10^-6 sec)',
     idealHashing_results_real,
     twoTieredApproach_results_real,
     graphHashing_results_real)

Время поиска

In [127]:
draw('average search time(microseconds/10^-6 sec)', idealHashing_results_int, twoTieredApproach_results_int, graphHashing_results_int)
In [128]:
draw('average search time(microseconds/10^-6 sec)',
     idealHashing_results_string,
     twoTieredApproach_results_string,
     graphHashing_results_string)
In [159]:
draw('average search time(microseconds/10^-6 sec)',
     idealHashing_results_real,
     twoTieredApproach_results_real,
     graphHashing_results_real)

Память

In [129]:
draw("processed memory(KB)", idealHashing_results_int, twoTieredApproach_results_int, graphHashing_results_int)
In [130]:
draw("processed memory(KB)",
     idealHashing_results_string,
     twoTieredApproach_results_string,
     graphHashing_results_string)
In [160]:
draw("processed memory(KB)",
     idealHashing_results_real,
     twoTieredApproach_results_real,
     graphHashing_results_real)